Search CORE

29 research outputs found

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

Author: Crego Josep-Maria
Pham Minh Quang
Senellart Jean
Yvon François
Publication venue: HAL CCSD
Publication date: 02/11/2019
Field of study

International audienceSupervised machine translation works well when the train and test data are sampled from the same distribution. When this is not the case, adaptation techniques help ensure that the knowledge learned from out-of-domain texts generalises to in-domain sentences. We study here a related setting, multi-domain adaptation, where the number of domains is potentially large and adapting separately to each domain would waste training resources. Our proposal transposes to neural machine translation the feature expansion technique of (Daum\'e III, 2007): it isolates domain-agnostic from domain-specific lexical representations, while sharing the most of the network across domains.Our experiments use two architectures and two language pairs: they show that our approach, while simple and computationally inexpensive, outperforms several strong baselines and delivers a multi-domain system that successfully translates texts from diverse sources

Joint WMT 2012 Submission of the QUAERO Project

Author: Allauzen Alexandre
Buschbeck Bianka
Crego Josep Maria
Freitag Markus
Herrmann Teresa
Huck Matthias
Lavergne Thomas
Le Hai-son
Ney Hermann
Niehues Jan
Peitz Stephan
Senellart Jean
Waibel Alex
Publication venue: Association for Computational Linguistics
Publication date: 20/04/2022
Field of study

KITopen

Joint WMT Submission of the QUAERO Project

Author: Adda Gilles
Allauzen Alexandre
Buschbeck Bianka
Crego Josep Maria
Freitag Markus
Herrmann Teresa
Leusch Gregor
Ney Hermann
Niehues Jan
Peitz Stephan
Senellart Jean
Waibel Alex
Wandmacher Tonio
Wuebker Joern
Publication venue: Association for Computational Linguistics
Publication date: 20/04/2022
Field of study

KITopen

N-code: an open-source Bilingual N-gram SMT Toolkit

Author: Crego Josep-Maria
Mariño José B.
Yvon François
Publication venue: Univerzita Karlova v Praze
Publication date: 01/01/2011
Field of study

This paper describes Ncode, an open source statistical machine translation (SMT) toolkit for translation models estimated as

n

-gram language models of bilingual units (emphtuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review the main features of the toolkit and explain how to build a translation engine with Ncode. We also report a short comparison with the widely known Moses system. Results show that Ncode outperforms Moses in terms of memory requirements and translation speed. Ncode also achieves slightly higher accuracy results

HAL Descartes

Revisiting Multi-Domain Machine Translation

Author: Crego Josep-Maria
Pham Minh Quang
Yvon François
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2021
Field of study

International audienceWhen building machine translation systems, one often needs to make the best out of heterogeneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has attracted a lot of recent work, that fall under the general umbrella of transfer learning. In this study, we revisit multi-domain machine translation, with the aim to formulate the motivations for developing such systems and the associated expectations with respect to performance. Our experiments with a large sample of multi-domain systems show that most of these expectations are hardly met and suggest that further work is needed to better analyze the current behaviour of multi-domain systems and to make them fully hold their promises

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Hal-Diderot

HAL-Rennes 1

Multi-Pivot Translation by System Combination

Author: Aurélien Max
Gregor Leusch
Hermann Ney
Josep Maria Crego
Publication venue
Publication date: 01/01/2010
Field of study

This paper describes a technique to exploit multiple pivot languages when using machine translation (MT) on language pairs with scarce bilingual resources, or where no translation system for a language pair is available. The principal idea is to generate intermediate translations in several pivot languages, translate them separately into the target language, and generate a consensus translation out of these using MT system combination techniques. Our technique can also be applied when a translation system for a language pair is available, but is limited in its translation accuracy because of scarce resources. Using statistical MT systems for the 11 different languages of Europarl, we show experimentally that a direct translation system can be replaced by this pivot approach without a loss in translation quality if about six pivot languages are available. Furthermore, we can already improve an existing MT system by adding two pivot systems to it. The maximum improvement was found to be 1.4 % abs. in BLEU in our experiments for 8 or more pivot languages. 1

CiteSeerX

Publikationsserver der RWTH Aachen University

Micro-adaptation lexicale en Traduction Automatique Statistique

Author: Crego Josep Maria
Leusch Gregor
Max Aurélien
Ney Hermann
Yvon François
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 01/01/2010
Field of study

Publikationsserver der RWTH Aachen University